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ABSTRACT 

Mitochondrial genome diversity in closely related 
species provides an excellent platform for investiga- 
tion of chromosome architecture and its evolution 
by means of comparative genomics. In this study, 
we determined the complete mitochondrial DNA se- 
quences of eight Candida species and analyzed 
their molecular architectures. Our survey revealed 
a puzzling variability of genome architecture, 
including circular- and linear-mapping and multi- 
partite linear forms. We propose that the arrange- 
ment of large inverted repeats identified in these 
genomes plays a crucial role in alterations of their 
molecular architectures. In specific arrangements, 
the inverted repeats appear to function as resolution 
elements, allowing genome conversion among dif- 
ferent topologies, eventually leading to genome 
fragmentation into multiple linear DNA molecules. 
We suggest that molecular transactions generating 
linear mitochondrial DNA molecules with defined 
telomeric structures may parallel the evolutionary 
emergence of linear chromosomes and multipartite 
genomes in general and may provide clues for the 
origin of telomeres and pathways implicated in their 
maintenance. 

INTRODUCTION 

Genome fragmented into multiple linear chromosomes 
terminating with telomeric arrays is a hallmark of the eu- 
karyotic cell. In contrast, molecular architectures of 



genomes in prokaryotes and organelles vary substantially 
(1). For instance, certain animal mitochondrial DNAs 
(mtDNAs) are monomeric circles (2), kinetoplastid 
protists have networks of catenated circles (3,4), and 
most plants and fungal mitochondria contain linear (cir- 
cularly permuted) concatemers that are heterogeneous in 
size (termed polydisperse linear DNA) (5-7). Finally, 
uniform Hnear mtDNAs terminating with defined 
terminal structures (mitochondrial telomeres) are found 
in a number of phylogenetically diverse taxa (8-28). In 
addition, mitochondria of numerous organisms contain 
multipartite genome; i.e. fragmented into multiple (from 
few to several hundred) circular or linear chromosomes 
(29-39). 

The predominant genome architecture may even differ 
in closely related organisms, i.e. conceptually different 
(monomeric linear versus circular-mapping and linear 
polydisperse) or containing varying proportions of topo- 
logically different mtDNA molecules. For example, 
mitochondria of Candida glahrata and Saccharomyces 
cerevisiae contain polydisperse hnear DNA molecules, 
with a minor fraction of circles and lariat structures 
generated by rolling circle replication (40). In contrast, a 
recent study revealed that mitochondria of C. albicans lack 
significant amounts of circular mtDNA molecules, con- 
taining predominantly a network of branched DNA struc- 
tures with hnear polydisperse mtDNA molecules — 
interpreted as recombination-driven replication (41,42). 
An alternative interpretation would be mitochondrial rep- 
lication just like in S. cerevisiae, and a reduced level of 
circular replicative DNA molecules, due to more effective 
recombination that is also responsible for branched struc- 
tures. Whatever the replication mechanism, physical 
mapping approaches such as restriction mapping, DNA 
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sequencing or PCR amplification indicate that any of 
these populations of hnear polydisperse mtDNA mol- 
ecules have predominantly circularly permuted sequences 
(i.e. can be reasonably well represented as single sequence 
records as deposited in GenBank, but should not be 
labeled circular as enforced by the database management). 
Such genome architectures are in most cases illustrated as 
circular maps. In contrast, species containing linear 
mtDNA molecules terminating with specific telomeric 
structures are best depicted in linear maps (43). 
Accordingly, we term mitochondrial genomes as 
circular- or linear-mapping. 

However, the picture is more complex in some yeast 
species with multiple forms of mitochondrial genomes. 
In Pichia pijperi and Williopsis mrakii, the linear 
mtDNA molecules terminating with telomeric hairpins 
(t-hairpins) coexist with monomeric and dimeric circles 
and Hnear polydisperse mtDNAs (11,44,45). Two types 
of DNA replicons occur in mitochondria of 
C. parapsilosis. Namely, the linear mtDNA molecules 
with telomeric arrays (t-arrays) of tandem repeats and 
multimeric minicircles derived exclusively from the 
sequence of mitochondrial telomeres (telomeric circles, 
t-circles) implicated in the telomere maintenance 
pathway (16,20,46^8). 

At present, little is known about the biological roles of 
different mtDNA forms and molecular mechanisms 
leading to architectural alterations of the mitochondrial 
genomes. Our ambition is to identify these mechanisms 
and to uncover their role in the evolution of linear 
chromosomes and corresponding telomeric structures. 
Therefore, we initiated a large-scale comparative study 
of the mitochondrial genomes in yeast species closely 
related to C. parapsilosis, whose mitochondrial telomeres 
share a number of structural features with their counter- 
parts at the ends of nuclear chromosomes (49,50). In 
previous reports, we described that strains of 
C. metapsilosis and C. orthopsilosis possess either 
linear-mapping mitochondrial genome, with similar archi- 
tecture as found in C. parapsilosis, or a circularized 
(mutant) form of the genome (51,52). Moreover, we 
found that C. subhashii contains yet another type of 
linear mitochondrial genome, which does not come with 
any detectable circular or concatemeric form. Instead, its 
linear mtDNA terminates with invertron-Hke telomeres, 
with a protein covalently bound to both 5' termini (10). 
The four Candida species containing hnear mitochondrial 
genomes are classified within the monophyletic 'CTG 
clade' of Hemiascomycetes (53,54). The same phylogenetic 
group also contains species with circular-mapping mito- 
chondrial genomes such as C. albicans (55), Debaryomyces 
hansenii (56) and Pichia sorbitophila (57). The occurrence 
of closely related organisms or even strains of the same 
species with different mitochondrial genome architecture 
is in fine with the hypothesis that linear- and 
circular-mapping mitochondrial genomes do not exhibit 
a radical difference, but that the genome forms may spor- 
adically interconvert via currently unknown molecular 
mechanism(s) (11,52). 

In this study, we analyze the complete mtDNA se- 
quences of eight additional Candida species. Our survey 



reveals that their molecular forms vary dramatically 
providing a unique opportunity for identification of struc- 
tural elements and molecular mechanisms affecting the 
genome architecture. At the same time, our analysis 
provides an insight on the evolution of linear chromo- 
somes and their telomeric structures. 

MATERIALS AND METHODS 

Yeast strains and cultivation 

Yeast strains analyzed in this study are hsted in Table 1. 
Yeasts were grown in hquid YPDG media (1% (w/v) yeast 
extract, 1% (w/v) peptone, 0.5% (w/v) glucose and 3% 
(v/v) glycerol), with constant shaking at 25-30°C until 
the late exponential phase. 

Pulsed field gel electrophoresis 

Screening for linear mitochondrial genomes was per- 
formed by pulsed field gel electrophoresis (PFGE) 
approach (10,11). Briefly, whole-cell DNA samples were 
prepared in agarose blocks, and separated in a 1.5% (w/v) 
agarose gel using a CHEF Mapper XA Chiller System 
(Biorad) with pulse switching set at 5-20 s (hnear 
ramping and 120° angle) for 42 h at 5V/cm. All separ- 
ations were performed in 0.5x TBE buffer (45 mM Tris- 
borate, 1 niM EDTA, pH 8.0) at 10°C. 

DNA sequencing and mitochondrial genome annotation 

The mtDNA used for DNA sequencing was purified from 
isolated mitochondria. Procedures for mtDNA prepar- 
ation, DNA sequence analysis and contig assembly were 
described previously (10,46,51,58-60). Genome annota- 
tions were performed using the MFannot tool (http:// 
megasun.bch.umontreal.ca/cgi-bin/nifannot/mfannot 
Interface.pl), and manually adjusted according to the 
sequence alignments of deduced protein products with 
their homologs from closely related yeast species. Intron 
sequences were identified using MFannot, RNAweasel 
(61) and Rfam (62). The precise boundaries were con- 
firmed by alignments of the corresponding sequence with 
an intron-less gene from a related species. Putative protein 
products encoded by intronic open reading frames were 
identified by searching the Pfam database (63) and classi- 
fied accordingly. 

Phylogenetic analysis 

For phylogenetic analysis, we have used amino acid se- 
quences of a protein set (Atp6-Atp8-Atp9-Cob-Coxl- 
Cox2-Cox3-Nadl-Nad2-Nad3-Nad4-Nad4L-Nad5- 
Nad6) encoded by 23 mitochondrial genomes. The se- 
quences were translated using translation table 4 (mold, 
protozoan and coelenterate mitochondrial code), except 
for S. cerevisiae, where translation table 3 (yeast mito- 
chondrial code) applies. The multiple ahgnments were per- 
formed by MUSCLE (64) and concatenated to one 
alignment. Alignment columns with >50% of gaps were 
filtered out, resulting in an ahgnment with 3932 sites. The 
phylogenetic tree was built with three different programs: 
PhyloBayes with the CAT substitution model (65), 
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MrBayes (66) with JTT model of amino acid substitution 
(67) and y-distributed rate variation between sites, and 
PhyML (68) with JTT model. Application of aU three 
programs gives the same tree topology. The only differ- 
ences occur within C. metapsilosis-C. orthopsilosis- 
C. parapsilosis clade due to high sequence similarity 
among these species. In the rest of the tree, all branches 
are highly supported by posterior probabilities (above 0.9 
in MrBayes and PhyloBayes), and most branches have 
bootstrap value of 100 in PhyML. Placement of C. alai 
in the tree has high posterior probability in both Bayesian 
programs, but low bootstrap values in PhyML. 

Gene order comparison 

To infer possible ancestral gene order, we used protein 
coding, rRNA and tRNA genes of 16 mitochondrial 
genomes from the 'CTG clade'. Non-conserved genes 
that occur only in some species (i.e. trnM3 in D. hansenii 
and P. sorbitopliila; dpoBa, dpoBh and orf756 in 
C. subhashii) were omitted in this analysis. We have recon- 
structed a possible history of rearrangements using a 
simple double cut and join model (DCJ) (69). The DCJ 
model is based on parsimony and includes commonly con- 
sidered rearrangement operations, such as reversal, trans- 
location, chromosome circularization, hnearization, 
fusion and fission. There is an efficient algorithm to 
compute the parsimonious distance of two genomes in 
DCJ model (69); however, an exact algorithm for inferring 
ancestral gene orders for a given set of present day 
genomes is not known. To infer the ancestral gene order 
under DCJ, we have implemented a local optimization 
procedure that in each iteration attempts to improve the 
solution by choosing new ancestral gene orders from a 
local neighborhood using a dynamic programming algo- 
rithm (70). The DCJ model does not handle genomes with 
duphcated genes. To resolve recent duplications in some 
of the genomes (C. albicans, C. maltosa, C. sojae, 
C. viswanathii), we removed duplicated genes, and 
included both possible forms of the genomes as alterna- 
tives in the corresponding leaves. Similarly, both isomers 
are allowed in the genomes that include long inverted 
repeats (C. alai, C. albicans, C. maltosa, C. neerlcmdica, 
C. sojae, L. elongisporus). In each leaf, one of the alterna- 
tive orders is chosen as a representative, so as to minimize 
the overall parsimony cost. Finally, we penalize occur- 
rence of multiple circular chromosomes, or combinations 
of Hnear and circular chromosomes in ancestral genomes. 

Enzymatic mapping of termini 

Approximately 1 |.ig nitDNA aliquots were treated with 
exonuclease III (ExoIII; New England Biolabs) or 
BAL-31 nuclease {New England Biolabs) according to 
the manufacturer's instructions, for increasing time 
periods. After enzyme inactivation (ExoIII for 20min at 
70°C; BAL-31 for lOmin at 65°C in the presence of 
EDTA), the mtDNA was precipitated with ethanol, 
dissolved in water, digested with a restriction endonucle- 
ase and electrophoretically separated in a 0.9% agarose 
gel. The labeling of mtDNA termini with T4 



polynucleotide kinase has been performed essentially as 
described previously (20). 

DNA hybridization probes 

Southern hybridization of PFGE separated yeast DNA 
samples (Figure 1) was performed with a probe containing 
an equimolar mixture of PCR products derived from cox2 
(345 bp) and nad4 (374 bp) of corresponding species. The 
foUowing PCR primers were used: 5'-TAGATGT 
NCCWACWCCWTGAG-3' and 5'-AYTCRTATTTTC 
AATATCATTG-3' (cox2); 5'-AGGTATHWTGG 
TWAARACACC-3' and 5'-CAGGWGAWACDAAWC 
CATG-3' {nad4). For C. subhashii, the equivalent PCR 
primers were 5'-CGTCCCAACACCATGAGG-3' and 
5'-ACTCGTACTTCCAGTACCACTG-3' (cox2); 5'-AG 
GGATCATGGTCAAGACG-3' and 5'-CTGGTGAGA 
CTAGCCCGTG-3' (nad4). In subsequent experiments, 
we used the following probes: P-668 (668 bp fragment 
amplified by PCR from the C. frijolesensis mtDNA 
using primers 5'-ATAATGGGTCAGTGAGTT-3' and 
5'-ACGTTCTCTAGCAGTTGA-3'), EH-1350 (1350 bp 
EcoKN -Hindlll fragment from C. frijolesensis mtDNA), 
H-1030 (1030 bp HindlW fragment from C. neerlandica 
mtDNA), and Ohgo-32 (32 nt oligonucleotide 5'-AATG 
AGATGAGGAAGTAAAGGGATAAGGATAA-3', 
corresponding to a palindrome sequence in C. viswanathii 
mtDNA). 

DNA sequence accession numbers 

Mitochondrial DNA sequences described in this work 
were deposited in the GenBank nucleotide sequence data 
hbrary under following accession numbers: HQ267968 
(C. alai NRRL Y-27739), HM594866 (C. frijolesensis 
NRRL Y-48060), GU136397 (C. jiufengensis CBS10846), 
EU267175 (C. maltosa CBS5611), EU334437 
(C. neerlcmdica NRRL Y-27057), EF468347 (C. sojae 
CBS7871), EF536359 (C. viswanathii CBS4024), 
HQ267969 (C. salmanticensis CBS5121). 



RESULTS AND DISCUSSION 

PFGE analysis of yeast mitochondrial genomes 

We employed the PFGE approach (11,40) to distinguish 
between polydisperse and uniform hnear mtDNA mol- 
ecules in samples from 24 yeast species (Table 1 and 
Figure 1). In experimental conditions used for the 
screening (see 'Materials and Methods' section), the 
uniform linear mtDNA molecules from C. subhashii (10) 
migrate as a discrete band of ~30kb (Figure 1, lane 8). In 
contrast, a smear is typical for C. albicans [Figure 1, lane 
10; (42)] and most other yeast species, with 
mtDNA-derived probes reveahng a strong signal 
between ~20 and 50 kb (Figure 1 and Table 1). This indi- 
cates that most examined species contain polydisperse 
hnear mtDNAs. On average, their lengths are larger 
than the genome unit, apparently containing more than 
one genome equivalent per molecule. However, in 
C. labiduridarum and C. frijolesensis we detected three 
discrete bands migrating in the region between 15 and 
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Figure 1. PFGE analysis of the yeast mtDNAs. The whole-cell DNA samples were separated by PFGE using a CHEF Mapper XA Chiller System 
(Biorad), blotted onto a nylon membrane and hybridized with mtDNA-derived probes as described in 'Material and Methods' section. Lane 1 — 
C. viswaimthii CBS 4024; lane 2— C. sojae CBS 7871; lane 3— C. maltosa CBS 5611; lane 4— C. iieerkmdicci NRRL Y-27057; lane 5— C. alai NRRL 
Y -21139; lane 6— C. lahiduridarum NRRL Y-27940; lane 1—C. frijolesensis NRRL Y-48060; lane 8— C. suhhashii CBS 10753; lane 9—C . jiufengensis 
CBS 10846; lane 10 — C. albicans CBS 562. Note that three discrete bands migrating in the region <50kb represent three linear mitochondrial 
chromosomes in C. lahiduridarum and C. frijolesensis (lanes 6 and 7). In contrast, four bands in C. alai (lane 5) do not hybridize with mtDNA probes 
and correspond to linear DNA plasmids (data not shown). 



50 kb (Figure 1, lanes 6-7). This points to a possibility that 
these species contain three uniform chromosomes in 
mitochondria (see below). Four distinct bands were also 
found in C. alai. However, these bands did not hybridize 
with the mtDNA probe (Figure 1 , lane 5) and subsequent 
sequence analysis revealed that they correspond to hnear 
DNA plasmids (to be described elsewhere). 

Mitochondrial genome isomers in species with 
polydisperse mtDNA 

Since the mitochondrial genome of C. albicans occurs in 
two isomers (42,71), we examined the presence of genome 
isomers also in other species with polydisperse mtDNAs. 
Restriction enzyme analysis of the mtDNAs from 
C. maltosa, C. neerlandica and C. sojae identified four 
minor mtDNA fragments (e.g. ~15, ~13, ~9 and ~7kb 
in the case of C. neerlandica mtDNA digested with BamHI 
and PvuII) indicating that they contain a 
circular-mapping genome with large repeated regions 
generating two isomers that are present in a stoichiometric 
ratio (Figure 2A-C). Subsequent sequence analysis con- 
firmed that all three genomes contain large inverted 
repeats (LIRs) that could be involved in the flip-flop re- 
combination generating genome isomers. This is in line 
with the observation that the LIRs represent recombin- 
ation hotspots in C. albicans mtDNA (42). The LIRs 
were also detected in the C. alai mtDNA sequence, but 
the presence of contaminating linear plasmids rendered 



the identification of isomers by restriction enzyme 
analysis inconclusive. 

Physical mapping uncovered genome isomers also in 
mitochondria of C. viswanathii. However, in this case we 
observed two BamHI (~7 and ~3.5kb) and Eco9II (~5 
and ~2.5kb) bands, with sizes corresponding to a 
monomer (lower faint band) and a dinier (upper band). 
Southern hybridization indicates that the two fragments 
have the same sequence (Figure 3A), and that the ratio 
between them varied in different preparations (data not 
shown). In this case, the complete mtDNA sequence of 
C. viswanathii contains LIRs arranged as a large palin- 
drome (see below), suggesting that the palindrome (repre- 
sented by the upper band) could be resolved into the 
smaller fragment (the lower band), i.e. a linear mtDNA 
with defined terminal sequences/structures. To support 
this idea we treated isolated mtDNA with BAL-3I 
nuclease prior to restriction enzyme analysis. This experi- 
ment demonstrated that the lower faint band was the only 
mtDNA fragment sensitive to BAL-31 nuclease activity 
(Figure 3B). On the other hand, this fragment seems to 
be refractory to both exonuclease III and T4 polynucleo- 
tide kinase (data not shown). This indicates that the 
termini of resolved Hnear mtDNA molecules are protected 
by a special arrangement. We presume that, similar to 
species from the genera Williopsis and Pichia (44), the 
Hnear mtDNA molecules terminate with single-stranded 
covalently closed telomeric hairpins (t-hairpins). 
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Figure 2. Restriction enzyme analysis reveals circular-mapping genome isomers. Candida neerlandica (A), C. maJlosa (B) and C. sojae (C) mtDNAs 
were digested with the restriction enzyme combinations BamHI + PvuII, ApaLI + Mlul and Agel + ApaLI, respectively, and electrophoretically 
separated in 0.9% (w/v) agarose gel. Black arrows indicate the DNA fragments present in both isomers, grey arrows label the pair of fragments 
specific to the isomers I or II. Schemes illustrate both isomers with the position of inverted repeats (shown bold within the inner circle) and 
corresponding restriction enzyme fragments (the outer circle). 



In contrast, we did not identify genome isomers in 
C. jiufengensis, which laclcs LlRs. 

Multipartite (fragmented) linear-mapping genomes 

As mentioned above, PFGE analysis revealed the presence 
of three distinct mtDNA bands in C. labiduridarum and 
C. frijolesensis. The size of the largest band (chromosome 
I) corresponds approximately to the sum of the middle 
(chromosome 11) and the smallest (chromosome 111) 



bands (Figure 1, lane 6-7), suggesting that the longest 
molecules might represent a master chromosome spht 
into two non-identical fragments. Therefore, we 
analyzed PFGE-separated mtDNA molecules of 
C. labiduridarum and C. frijolesensis by Southern hybrid- 
ization using two probes derived from distant regions of 
chromosome 1 (Figure 4A). The probe P-668 hybridized 
with chromosomes 1 and III and also detected some 
mtDNA in wells and smears. The pattern detected by 
the probe H-1030 was similar, except that it hybridized 
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Figure 3. Circular- and linear-mapping genome isomers in mitochondria of C. viswanalhii. (A) The mtDNA samples were digested with BamHI 
(lane 1) or Eco91I (lane 2) and separated in 1% (w/v) agarose gel. The Southern blot was hybridized with radioactively labeled oligonucleotide probe 
Oligo-32 derived from the large palindrome (shown as dashed arrows). The solid arrows show positions of the palindrome and the presumed terminal 
fragments of resolved linear molecules capped with t-hairpins. Scheme shows presumed circular- (I) and linear-mapping (II) genome isomers. 
(B) Isolated mtDNA was treated or untreated with BAL-31 nuclease (0.2 U for 5min). The mtDNA was then extracted from the reaction, 
digested with BamHI or Eco91I endonuclease, and electrophoretically separated. Note that the fragments containing presumed t-hairpins were 
sensitive to BAL-31 nuclease (indicated by asterisk). 
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Figure 4. Multipartite linear-mapping genomes in C. lahiditridarum and C. frijolesensis (A) PFGE separated samples of C. labiduridarum NRRL 
Y-27940 (lane 1) and C. frijolesensis NRRL Y-48060 (lane 2) were blotted onto a nylon membrane and hybridized with the radioactively labeled 
probes P-668 and H-1030 (regions hybridizing with both probes are shown as dashed lines). Presumed master (I) and two smaller chromosomes 
(II and III) are indicated. Note that the master chromosome occurs in four isomers (i.e. Lm — Rm — Ln — Rn (shown in the scheme), 
Lni — Rni — Rii — Lji, Rui — Lm — Ln — Rji and Rm — Lnj — Rji — Lj|. 'L' and 'R' indicate the left and the right telomere, respectively). The 
C. frijolesensis mtDNA (~1 |xg) was digested with BAL-31 nuclease (B) or exonuclease III (ExoIII) (C) as indicated. After nuclease inactivation, 
the DNA was digested with EcoRV, separated in 0.9% (w/v) agarose gel. The Southern blots were hybridized with the P-668 and EII-1350 probes 
specific for the left and the right arm of the master chromosome, respectively (see 'Materials and Methods' section). Arrows show the positions of the 
left (L) and right (R) terminal fragments and their fusions (R + R, R + L and L + L). Note that after ExoIII treatment the telomeric fragments form 
two subpopulations that differ in their sensitivity to the ExoIII treatment. This indicates that the linear mtDNA molecules possess an open structure 
with 5' overhang or blunt end or covalently closed t-hairpin. (D) The C. frijolesensis mtDNA was treated with antarctic phosphatase and labeled with 
[y^^^P]ATP and T4 polynucleotide kinase. The mtDNA was then digested with restriction endonuclease EcoRV (lane 1) or Bglll (lane 2) and 
separated in 0.8% (w/v) agarose gel (left panel). The gel was fixed in 10% (v/v) inethanol/10% (v/v) acetic acid for 30 min, dried overnight and 
autoradiographed (right panel). Arrows indicate the position of telomeric fragments containing the open structures accessible to terminal labeling. 
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Figure 4. Continued 



with chromosome II instead of chromosome III. To opened and thus accessible to exonuclease III and T4 

confirm that all three chromosomes are linear, we polynucleotide kinase (Figure 4D). Southern hybridiza- 

treated C. frijolesensis mtDNA with BAL-31 nuclease, tion revealed four faint restriction fragments of the 

exonuclease III, and T4 polynucleotide kinase. We mtDNA (designated as Ln + Lm, Rn + Rni> Ln + Rm, 

observed that the presumed terminal restriction enzyme Lni + Rn) that are refractory to aU three enzymes 

fragments were aU sensitive to BAL-31 nuclease (Figure 4B-D). The fragment sizes correspond to junc- 

(Figure 4B). Interestingly, the treatment with exonuclease tions between chromosomes 11 and 111, and their 

111 revealed two subpopulations of terminal fragments dif- presence shows that the master chromosome occurs in 

fering in their accessibihty to the enzyme activity four flip-flop isomers (i.e. Lm — Rm — Ln — Rn, 

(Figure AC). This indicates that the ends of the linear Lm — Rm — Rn — Ln, Rm — Lm — Ln — Rn and 

mtDNA molecules might adopt a covalently closed struc- Rm — Lm — Rn — Ln). This suggests that fragmented 

ture such as a t-hairpin, which in a fraction of molecules is hnear-mapping genomes (i.e. uniform linear mtDNAs 
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Table 2. Genes present in the mitochondrial genomes sequenced in this study 



Species 



Protein subunits of oxidative phosphorylation complexes 

I III IV V 



Protein synthesis and RNA processing 



Ribosomal 
protein 



nadl nadl nadS nad4 iiad4L nadS nad6 cob cox] cox2 cox3 atp6 atp8 atp9 rp.sS 



rRNA tRNA RNase 

P RNA 

rns ml tni rnpB 



Candida alai 
Candida frijoJesemis 
Candida jiufengensis 
Candida maltosa 
Candida neerlandica 
Candida salmanticensis 
Candida sojae 
Candida viswanathii 



24+ r 

24 
24 

24 + 2" 

24 

24 

24 + 5" 
24+1" 



"Duplicated within LIRs. 



with resolved termini corresponding to chromosomes 
I-III) may coexist with circular-mapping genome forms 
(i.e. polydisperse linear mtDNAs lacking homogeneous 
terminal structures that correspond to the smear 
observed in PFGE). 

The occurrence of multipartite genomes raises a 
question concerning the distribution of mtDNA molecules 
during cell division. Since we do not have any evidence for 
a specific segregation machinery analogous to the mitotic 
spindle ensuring the proper segregation of individual 
chromosomes in mitochondria, we propose that 
presumed circular-mapping genome and/or chromosome 
I may represent the 'master copies' playing a key role in 
the genome transmission. 

Genetic organization of the mitochondrial genomes 

With the aim to investigate the mitochondrial genome 
architecture in more detail and to identify sequence and/ 
or structural features involved in the genome architecture 
alterations, we determined the complete mtDNA se- 
quences of eight yeast species; C. alai, C. frijolesensis, 
C. jiufengensis, C. maltosa, C. neerlandica, 
C. salmanticensis, C. sojae and C. viswanathii 
(Supplementary Figure SI). The sizes of sequenced 
mtDNAs range from 25.7 (C. salmanticensis) to 62.9 kb 
(C. maltosa), and their G + C content varies from 20.5 
(C. salmanticensis) to 19. Wa (C. sojae) (Table 1). The 
genomes contain essentially the same set of conserved 
genes including the genes for subunits of ATP synthase 
{atp6,8,9), apocytochrome b (cob), cytochrome c oxidase 
{coxl,2,3), NADH:ubiquinone oxidoreductase 
{nadl, 2, 3, 4, 4 L. 5. 6), large and small ribosomal RNA [ml 
and rns) and a complete set of transfer RNAs (trn). The 
C. salmanticensis mtDNA has two additional genes: i.e. 
rps3/varl coding for a subunit of the mitochondrial 
ribosome and rnpBjrpml for the RNA subunit of RNase 
P (Table 2). One or more genes are duplicated in 
C. maltosa, C. sojae and C. viswanathii mtDNAs as they 
are localized within the LIRs. 

The presence of tRNA^'''' with an UCA anticodon, and 
the absence of an S. cerevisiae homolog of the abnormal 
tRNA'^'""' with 8-nt in the anticodon loop (decoding CUN 
as threonine), indicate that UGA and CUN codons are 



recognized as tryptophan and leucine, respectively. The 
codon assignment was verified by multiple ahgnments of 
protein sequences, which led us to the conclusion that 
UGA(Trp) is the only deviation from the standard 
genetic code. 

Introns were identified in cob, coxl, nadS and ml genes 
(Table 3). Their number and distribution among species 
vary from one (in C. alai) to 10 (in C. frijolesensis). The 
introns predominantly belong to group I and in many 
cases contain an open reading frame (ORF) coding for 
putative LAGLIDADG and GIY-YIG type endonucle- 
ases (group I) or reverse transcriptases/maturases (group 
II introns). 

Our previous studies (46,51) revealed that 
C. metapsilosis, C. orthopsilosis and C. parapsilosis have 
the same gene order in their mtDNAs. Likewise, 
C. frijolesensis , C. neerlandica and C. viswanathii have 
the same genetic organization, except that trnMl has 
been duplicated and inverted in the latter species 
(Figure 5A). All other species examined exhibit unique 
gene arrangements, with synteny reduced to four 
conserved gene clusters (i.e. trnN-atp6, coxl-atp9, 
cob-nad3 and rnl-cox2) in C. jiufengensis versus 
C. parapsilosis (Figure 5B). 

LIRs and the genome architecture 

Analysis of the collected yeast mtDNA sequences reveals 
that the most prominent feature of the genome architec- 
ture is the presence of relatively long duphcations, 
arranged as inverted repeats (LIRs). These elements are 
present in all but one (C. jiufengensis) mtDNA, and their 
lengths vary from 109 bp (present as the sub terminal 
repeats in the linear mtDNA of C. salmanticensis) to 
14 379 bp (in C. maltosa) thus substantially expanding 
the genome length (Supplementary Figure SI). In most 
cases, LIRs comprise non-coding sequences or contain 
only a few genes or gene fragments. In C. sojae, the 
8658 bp LIRs represent a block duplication of 11 genes 
(i.e. trnA, cox2, trnMl, cob, trnM2, rns, trni, atp9, 
trnRl, nad2, nad3). In most cases, the pairs of LIRs are 
separated by long unique regions. However, we noticed 
two special arrangements of LIRs: (i) in C. viswanathii 
identical copies of 4162 bp inverted repeats separated by 
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Table 3. Identified intron sequences 



Species 


Gene 


Intron 


Intron group 


Intronic ORFs 


Candida alai 


coxl 


all 


IB 




Candida frijolesensis 


cob 


bll 


I 








bI2 


lA 








bI3 


IB 


orfS (LAGLIDADGl endonuclease) 




coxl 


alll 


II (domainV) 


orfl (reverse transcriptase/maturase; HNHc domain) 






all 


IB 


orf2 (LAGLIDADGl endonuclease) 






aI2 


IB 


orfS (LAGLIDADGl endonuclease) 






aI3 


I 


orf4 (LAGLIDADG2 endonuclease) 






aI4 


IB 






nadS 


ndSIl 


I 


orf6 (LAGLIDADG2 endonuclease) 










o;;/7 (LAGLIDADG2 endonuclease) 




ml 


rll 


lA 




Candida jiiifengensis 


cob 


bll 


ID 


orfl (GIY-YIG endonuclease) 




coxl 


all 


lA (derived) 








aI2 


ID 


orfl (LAGLIDADGl endonuclease) 






aI3 


IB 








aI4 


IB 








aI5 


IB 


orf3 (LAGLIDADG endonuclease; truncated) 






aI6 


lA (derived) 






ml 


rll 


lA (derived) 








rI2 


lA (derived) 




Candida maltosa 


cob 


bll 


ID 


orf2 (GIY-YIG endonuclease) 






bI2 


lA (derived) 






coxl 


all 


IB 


orfl (LAGLIDADGl endonuclease) 






aI2 


IB2 (derived) 






ml 


rll 


lA 








rI2 


lA (derived) 




Candida neerlandica 


cob 


bll 


I 








bI2 


lA 








bI3 


IB 


orj3 (LAGLIDADGl endonuclease) 




coxl 


all 


IB 


orfl (LAGLIDADGl endonuclease) 






aI2 


IBl (derived) 








aI3 


IBl (derived) 


orf2 (LAGLIDADGl endonuclease) 




nadS 


ndSIl 


IB 


orf4 (LAGLIDADG2 endonuclease) 




ml 


rll 


lA 




Candida salman ticemi.s 


cob 


bll 


ID 


orfl (GIY-YIG endonuclease) 




coxl 


all 


IBl (derived) 






nadS 


ndSIl 


IB2 (derived) 






ml 


rll 


IC2 








rI2 


I 




Candida sojae 


coxl 


all 


IB 


orfl (LAGLIDADGl endonuclease) 






alll 


II (doinainV) 


orfl (reverse transcriptase; HNHc domain) 


Candida vi.swanathii 


cob 


bill 


II (domainV) 


oc/3 (reverse transcriptase/maturase; HNHc domain) 




coxl 


all 


IB (3', partial) 


orfl (LAGLIDADGl endonuclease) 






aI2 


IB 








aI3 


IB 


orfl (LAGLIDADGl endonuclease) 




nadS 


ndSIl 


IB 





a 798 bp non-coding sequence form a large palindrome 
and (ii) the two different inverted repeats (734 and 
1229 bp) are separated by a 228 bp non-coding sequence. 
The second arrangement is present in the region of 
C. frijolesensis chromosome I, corresponding to the 
junction of chromosomes II and III. Since the two 
smaller chromosomes possess different LIRs at their 
termini, the chromosome I contains different sequences 
at the opposite ends. 

As mentioned above, we demonstrate the presence of 
genome isomers in C. frijolesensis, C. labiduridarum, 
C. maltosa, C. neerlandica, C. sojae and C. viswanathii, 
but neither in C. jiiifengensis lacking the LIRs nor in 
C. salmanticensis, which has the LIRs in subterminal 
regions of the hnear mtDNA extended by t-arrays (2nx 
104 bp). 



Since the LIRs represent a suitable substrate for hom- 
ologous recombination generating the genome isomers, 
the recombination transactions may be implicated in al- 
terations of the genome architecture, which may in turn 
depend on LIR arrangements. We notice that the arrange- 
ment of LIRs in the mtDNA sequences correlates with the 
mitochondrial genome architecture. While C. mcdtosa, 
C. neerlandica and C. sojae have two circular-mapping 
genome isomers, C. viswanathii contains circular- and 
linear-mapping isomers, and C. frijolesensis possesses 
circular- and multipartite linear-mapping genome 
forms. This suggests that specifically arranged LIR 
copies (such as in C. viswanathii and C. frijolesensis) 
play a role as resolution elements, allowing interconver- 
sion between the circular- and linear-mapping genome 
forms (C. viswanathii), eventually leading to genome 
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Figure 5. Comparison of mitochondrial gene orders among species from C. neeiiandica-C. tropicalis (A), L. elongisporus-C. parapsilosis (B) and 
C. suhhashii-C. alai (C) lineages. Individual genes and blocks with conserved gene order are shown by identical colors. Duplicated regions are 
framed. The symbols wedge and caret indicate the orientation of genes, TEL (telomeres) and LIR. In L. elongisporm, both LIRs (LIR*) consist of 
two regions of ~4kb separated by 574 and 95 bp-long unique sequences. 



fragmentation into multiple linear chromosomes 
(C. frijolesensis). 

The comparison of C. neerlandica, C. viswanathii and 
C. frijolesensis underlines the presumed role of LIRs in the 
genome architecture. All three species are phylogenetically 
closely related, with essentially the same mitochondrial 
genome organization. However, they differ in LIR ar- 
rangements and genome architecture (i.e. circular 
mapping; circular- and linear- or multipartite linear 
mapping). 

Large pahndromes are structural elements suitable for 
resolution of uniform linear molecules from circular and/ 
or linear polydisperse mtDNAs. In general, such se- 
quences are known hotspots of genomic instability due 
to their inherent ability to form cruciform or hairpin struc- 
tures resulting in DNA replication stall sites (72-74). 
While in Escherichia coli, palindromic sequences cause 
double-stranded breaks induced by SbcCD complex (75), 
in the spirochete Borrelia the palindromes are processed 
by telomere resolvase (ResT) into t-hairpins (76). 



The latter mechanism parallels the palindrome resolution 
involved in the conversions of circular replication inter- 
mediates into linear-mapping mtDNAs in Williopsis and 
Pichia species (44) as well as in the formation of hnear 
mtDNA monomers from linear and circular dimeric rep- 
lication intermediates in the cilliate Paramecium {11) and 
the crustacean Armadillidium vulgare (78), respectively. 
Our results indicate that the pahndrome in the 
C. viswanathii mtDNA is resolved into t-hairpins suggest- 
ing that linear mtDNA molecules with defined termini are 
generated during this process. On the basis of PFGE 
analysis (Figure 1, lane 1), we assume that the fraction 
of polydisperse mtDNA molecules fully processed into 
uniform mtDNA monomers is relatively low. In 
contrast, we detected three hnear chromosomes, a smear 
of polydisperse mtDNA molecules and thp-flop isomers of 
chromosome I in C. frijolesensis samples. This indicates 
that circular-mapping genome forms are processed into 
chromosome I and further resolved into two smaller 
chromosomes. 
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Figure 5. Continued 
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Terminal inverted repeats appear to be a typical feature 
of linear-mapping mitochondrial genomes occurring in 
phylogenetically distant organisms (9,10,15,19-21,24, 
25,27,34,38,44,51,79,80) indicating that they arose by 
analogous evolutionary trajectories. These repeats 
usually consist of non-coding sequences, and sometimes 
a few genes. The linear-mapping mitochondrial genome 
of the stramenopile Proteromonas lacertae possesses even 
15.6kb-long terminal LIRs with about two-thirds of genes 
(21). Therefore, we assume that the terminal LIRs are 
remnants of resolution elements that emerge from 
segmental duphcations of mitochondrial genome. 
Alternatively, the may derive from invertrons such as 
hnear mitochondrial DNA plasmids that are known to 
integrate into mtDNAs (10). 

Phylogenetic analysis 

We took advantage of the mtDNA-derived data and 
analyzed phylogenetic relationship of investigated yeast 
species by Bayesian and maximum hkelihood methods. 
AH three methods resulted essentially in the same tree 
topology. The tree calculated by PhyloBayes (Figure 6) 



is supported by high posterior probabilities on most 
branches and is consistent with the study of Fitzpatrick 
et cil. (53) indicating that the monophyletic 'CTG clade' 
spHts into two major lineages: the first represented by 
D. hansenii and P. sorhitophila, and the second by the 
C. albicans-C . parapsilosis group. Incorporation of add- 
itional recently described species (81-84) in the phylogen- 
etic analysis revealed more detailed relationship among 
species in the latter lineage. This hneage splits into three 
subgroups (i.e. L. elongisporus-C. parapsilosis, 
C. maltosa-C . tropicalis and C. subhashii-C. alai) each 
containing species with circular- and hnear-mapping 
mtDNAs. The occurrence of different types of mitochon- 
drial telomeres (i.e. t-arrays in C. metapsilosis, 
C. orthopsilosis and C. parapsilosis; t-hairpins in 
C. viswanathii and C. frijolesensis; inverton-like telomeres 
with a t-protein in C. subhashii) in each subgroup is con- 
sistent with the tree topology. Similar to C. parapsilosis, 
the linear mitochondrial genome of C. salmanticensis ter- 
minates with t-arrays, although the sequence of its mito- 
chondrial telomeres is different. Since C. salmanticensis 
belongs to early branching hemiasconiycete lineages this 
hnear mitochondrial genome emerged independently on 
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Figure 6. Phylogenetic tree based on mtDNA encoded proteins. Phylogenetic tree was calculated from multiple sequence alignments of mitochon- 
drial proteins by PhyloBayes (65). Posterior probabilities are shown at corresponding branches. The mitochondrial genome forms were reported 
elsewhere (10,20,22,51,55-57,60,86-89) or analyzed in this study. C — circular-mapping genome; LI, L2 and L3 indicate the type of linear-mapping 
genomes according to the telomeric structures, i.e. t-hairpins, t-arrays and invertron like with t-proteins, respectively; 3xLl — tripartite linear-mapping 
genome with t-hairpins (see Table 1 for details). 



linear mtDNAs in species from the 'CTG clade', presum- 
ably by employing similar molecular niechanism(s). 

Reconstruction of ancestral mitochondrial genomes 

Our previous reports (10,46) as well as the comparison of 
mtDNAs examined in this study revealed a number of 
conserved gene clusters. This prompted us to use the 
gene orders of 1 5 species from the 'CTG clade' for recon- 
struction of possible ancestral mitochondrial genomes in 
corresponding nodes of the phylogenetic tree (Figure 7), 
using the DCJ model (69) and local optimization proced- 
ures (70). For example, the presumed ancestor of 
C. parapsilosis and C. Jiufengensis, which differ by the 
genome form, had a circular-mapping genome. We 
propose a simple evolutionary scenario leading to the 
Hnear-mapping mitochondrial genome (Figure 8). In this 
scenario, a resolution of a recombination transaction 
between the gene pairs cox2-trnN and cob-atp9 results in 
the formation of mtDNA with the gene order observed in 
circularized mutants of C. orthopsilosis and C. metapsilosis 
and its subsequent linearization between the genes nad3 
and atp6 generates a linear mtDNA with genetic organ- 
ization observed in the linear-mapping mitochondrial 
genomes of C. metapsilosis, C. orthopsilosis and 
C. parapsilosis. In contrast, recombination between the 
gene pairs rnl-coxl and atp6-nad3 in the presumed ances- 
tral genome leads to the identical arrangement of genes as 
is present in the C. jiufengensis mtDNA. 



On the origin of 'true' linear mitochondrial genomes 

In a number of species, replication of linear-mapping 
genomes relies on circular intermediates (monomers or 
dimers) generating hnear concatemers via rolling circle 
and/or recombination-dependent replication mechanisms 
(44,45). In contrast, no genome-sized circles or genome 
concatemers were detected in mitochondria of 
C. parapsilosis and C. subhashii, which harbor uniform 
hnear mtDNA molecules terminating with t-arrays and 
t-proteins, respectively (10,20,46). Hence, these hnear- 
mapping genomes can be considered as 'true' hnear 
genomes. This is further underlined by the presence of 
active telomere maintenance pathways ensuring their 
complete replication. We posit that linear-mapping 
genomes with terminal structures such as t-hairpins cor- 
respond to a transient state between circular mapping and 
'truly' monomeric linear mitochondrial genomes. 
T-hairpins formed at linear mtDNA termini provide sub- 
strates for terminus elongation by an active telomere 
maintenance pathway [e.g. recombination-dependent 
mechanism operating in C. parapsilosis mitochondria 
(49,50)]. Once this pathway ensures the stability of a 
hnear genome, circular replication intermediates and/or 
polydisperse mtDNAs become dispensable for the 
system. Conversely, a defect in the telomere-maintenance 
pathway may result in intramolecular end-to-end fusion, 
thus re-establishing the original circular-mapping mito- 
chondrial genome architecture (Figure 9). 
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Figure 7. Reconstruction of ancestral genomes. The figure shows possible ancestral gene orders and the number of events on each branch found by 
local optimization for the DCJ model. The intervals show range of numbers of events in equally parsimonious histories. Red connectors in the gene 
orders for present day and ancestral genomes represent inferred breakpoints on the branch to the nearest ancestor. Due to space constraints, the 
figure omits tRNA genes (even though the reconstructions were performed including tRNAs); full gene orders including tRNAs are shown in 
Supplementary Figure S2. The figure includes duplicated genes, which were restored after ancestral gene order reconstruction. Note that the linear 
and its circularized (mutant) mitochondrial genome forms of C. orthopsilosis were used in the analysis (51,60). 
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Figure 8. A hypothetical pathway leading to mitochondrial genomes of C. parapsilosis and C. jiufengensis from the most recent common 
ancestor. We propose a simple scenario allowing delineation of the gene order found in both the circular-mapping genome of C. jiufengensis and 
the 'true' linear genome of C. parapsilosis from a reconstructed circular-mapping ancestor inferred by the analysis shown in Figure 7 and 
Supplementary Figure S2. The process includes reciprocal recombination events between the gene pairs (i) rnljcoxl and nad3latp6 or (ii) coxlj 
trnN and cohlatp9, followed by opening of the circular-mapping genome between the genes nad3 and atp6 in the latter case. Note that the 
circular-mapping genome intermediate prior its linearization has identical gene order as the mitochondrial telomere mutants of C. metapsilosis 
and C. ortliopsilosis (51,52). 
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Figure 9. A hypothesis on the origin of linear chromosomes in yeast mitochondria. (A) A circular-mapping genome represented by linear 
polydisperse mtDNAs with [e.g. C. glahrata, S. cerevLiiae (40)] or potentially without [e.g. C. albicans (42)] a fraction of circular molecules. 
(B) Genome rearrangements may result in the formation of a palindrome allowing the resolution of a circular-mapping genome into linear chromo- 
somes with defined terminal structures such as t-hairpins. Such genomes were observed in several species [e.g. P. pijperi and IV. mrakii (11,44), 
C. vLmanathii] containing uniform linear mtDNAs, with t-hairpins resolved from circular molecules (monomers and dimers) and/or linear 
polydisperse mtDNAs. (C) Multiple resolution elements (i.e. two types of LIRs) allow the genome fragmentation into multiple linear chromosomes 
(e.g. C. frijolesensi.i, C. lahiduridarum). (D) The termini of the linear chromosomes may provide a substrate for further elongation via active 
maintenance mechanisms, such as the t-circle dependent pathway observed in 'true' linear genomes (e.g. C. metapsilosis, C. orthopsilosis. 
C. parapsUosis , C. salmanticensis) (49,50). Defects in the telomere maintenance result in the genome circularization via end-to-end fusion, as in 
mitochondrial telomere mutants of C. metapsilosis and C. orthopsilosis containing circular-mapping genomes (51,52). 
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